Goto

Collaborating Authors

 performance bug


A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

Neural Information Processing Systems

The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change.


PerfBench: Can Agents Resolve Real-World Performance Bugs?

Garg, Spandan, Moghaddam, Roshanak Zilouchian, Sundaresan, Neel

arXiv.org Artificial Intelligence

Performance bugs are inefficiencies in software that waste computational resources without causing functional failures, making them particularly challenging to detect and fix. While recent advances in Software Engineering agents have shown promise in automated bug fixing, existing benchmarks primarily focus on functional correctness and fail to evaluate agents' abilities to identify and resolve non-functional issues like performance bugs. We introduce PerfBench, a benchmark comprising 81 real-world performance bug-fixing tasks from popular .NET repositories on GitHub. Unlike existing benchmarks that rely on pre-existing test suites, PerfBench features a novel evaluation harness that allows agents to generate their own performance benchmarks and validates fixes by comparing execution metrics collected for developer fix and agent fix. Each task in PerfBench is derived from actual developer fixes linked to performance-related issues, which are then verified by human experts, ensuring real-world relevance. Our evaluation reveals that current state-of-the-art coding agents struggle with performance optimization tasks, with baseline OpenHands agent achieving only a ~3% success rate on our benchmark. We develop OpenHands-Perf-Agent, which incorporates performance-aware tooling and instructions and achieves a ~20% success rate on the benchmark. We show that by ensuring the agent has proper instructions to benchmark its changes and tooling for benchmark output processing, we can improve the agent performance significantly, but room for improvement still remains. PerfBench provides a challenging test set for furthering the capabilities of agents in fixing performance issues.



A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

Neural Information Processing Systems

The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We demonstrate AutoPerf's generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs.


Reviews: A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

Neural Information Processing Systems

STRONG POINTS/CONTRIBUTIONS 1) The false positive rates and false negative rates observed when using AutoPerf are impressively low. NEGATIVE POINTS 1) The paper lacks a lot of technical depth and novelty… autoencoders for anomaly detection are widely used, and the problem domain (detecting performance bugs) has been studied previously as well. Knowing what was changed in the code between P_i and P_i 1 could be very, very helpful. DETAILED COMMENTS One comment is that I'm not sure it makes a lot of sense to train separate autoencoders for each function (or group of functions, if you are doing the k-means thing). Likely, there are going to be certain characteristics of the distributions that are shares across all functions, and I worry that you are wasting a lot of compute power by relearning everything.


A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

Neural Information Processing Systems

The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We demonstrate AutoPerf's generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs.


RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot

Garg, Spandan, Moghaddam, Roshanak Zilouchian, Sundaresan, Neel

arXiv.org Artificial Intelligence

Performance bugs are non-functional bugs that can even manifest in well-tested commercial products. Fixing these performance bugs is an important yet challenging problem. In this work, we address this challenge and present a new approach called Retrieval-Augmented Prompt Generation (RAPGen). Given a code snippet with a performance issue, RAPGen first retrieves a prompt instruction from a pre-constructed knowledge-base of previous performance bug fixes and then generates a prompt using the retrieved instruction. It then uses this prompt on a Large Language Model (such as Codex) in zero-shot to generate a fix. We compare our approach with the various prompt variations and state of the art methods in the task of performance bug fixing. Our evaluation shows that RAPGen can generate performance improvement suggestions equivalent or better than a developer in ~60% of the cases, getting ~39% of them verbatim, in an expert-verified dataset of past performance changes made by C# developers.


Understanding Bugs in Multi-Language Deep Learning Frameworks

Li, Zengyang, Wang, Sicheng, Wang, Wenshuo, Liang, Peng, Mo, Ran, Li, Bing

arXiv.org Artificial Intelligence

Deep learning frameworks (DLFs) have been playing an increasingly important role in this intelligence age since they act as a basic infrastructure for an increasingly wide range of AIbased applications. Meanwhile, as multi-programming-language (MPL) software systems, DLFs are inevitably suffering from bugs caused by the use of multiple programming languages (PLs). Hence, it is of paramount significance to understand the bugs (especially the bugs involving multiple PLs, i.e., MPL bugs) of DLFs, which can provide a foundation for preventing, detecting, and resolving bugs in the development of DLFs. To this end, we manually analyzed 1497 bugs in three MPL DLFs, namely MXNet, PyTorch, and TensorFlow. First, we classified bugs in these DLFs into 12 types (e.g., algorithm design bugs and memory bugs) according to their bug labels and characteristics. Second, we further explored the impacts of different bug types on the development of DLFs, and found that deployment bugs and memory bugs negatively impact the development of DLFs in different aspects the most. Third, we found that 28.6%, 31.4%, and 16.0% of bugs in MXNet, PyTorch, and TensorFlow are MPL bugs, respectively; the PL combination of Python and C/C++ is most used in fixing more than 92% MPL bugs in all DLFs. Finally, the code change complexity of MPL bug fixes is significantly greater than that of single-programming-language (SPL) bug fixes in all the three DLFs, while in PyTorch MPL bug fixes have longer open time and greater communication complexity than SPL bug fixes. These results provide insights for bug management in DLFs.


Characterizing Performance Bugs in Deep Learning Systems

Cao, Junming, Chen, Bihuan, Sun, Chao, Hu, Longjie, Peng, Xin

arXiv.org Artificial Intelligence

Deep learning (DL) has been increasingly applied to a variety of domains. The programming paradigm shift from traditional systems to DL systems poses unique challenges in engineering DL systems. Performance is one of the challenges, and performance bugs(PBs) in DL systems can cause severe consequences such as excessive resource consumption and financial loss. While bugs in DL systems have been extensively investigated, PBs in DL systems have hardly been explored. To bridge this gap, we present the first comprehensive study to characterize symptoms, root causes, and introducing and exposing stages of PBs in DL systems developed in TensorFLow and Keras, with a total of 238 PBs collected from 225 StackOverflow posts. Our findings shed light on the implications on developing high performance DL systems, and detecting and localizing PBs in DL systems. We also build the first benchmark of 56 PBs in DL systems, and assess the capability of existing approaches in tackling them. Moreover, we develop a static checker DeepPerf to detect three types of PBs, and identify 488 new PBs in 130 GitHub projects.62 and 18 of them have been respectively confirmed and fixed by developers.


Tool Finds Software Update Update Bugs In Hours, Not Days - aster.cloud

#artificialintelligence

It's a common frustration--software updates intended to make our applications run faster inadvertently end up doing just the opposite. These bugs, called performance regressions in the field of computer science, are time-consuming to fix because locating software errors normally requires substantial human intervention. To overcome this obstacle, researchers at Texas A&M University, in collaboration with computer scientists at Intel Labs, developed a completely automated way of identifying the source of the errors. Their algorithm, based on a specialized form of machine learning called deep learning, is not only turnkey, but also quick. It finds performance bugs in a matter of a few hours instead of days.